[BOLT] Optimize basic block loops to avoid n^2 loop #156243

Mark-Simulacrum · 2025-08-31T13:14:01Z

This improves BOLT runtime when optimizing rustc_driver.so from 15 minutes to 7 minutes (or 49 minutes to 37 minutes of userspace time).

github-actions · 2025-08-31T13:14:18Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-08-31T13:14:48Z

@llvm/pr-subscribers-bolt

Author: Mark Rousskov (Mark-Simulacrum)

Changes

This improves BOLT runtime when optimizing rustc_driver.so from 15 minutes to 7 minutes (or 49 minutes to 37 minutes of userspace time).

Full diff: https://github.com/llvm/llvm-project/pull/156243.diff

1 Files Affected:

(modified) bolt/lib/Core/BinaryFunction.cpp (+13-1)

diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index 6cac2d0cca2cb..a86e204cae974 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -3591,6 +3591,18 @@ void BinaryFunction::fixBranches() {
   auto &MIB = BC.MIB;
   MCContext *Ctx = BC.Ctx.get();
 
+  // Caches `FunctionLayout::nextBasicBlock(IgnoreSplits = false)`.
+  // nextBasicBlock uses linear search to find the next block, so the loop
+  // below becomes O(n^2). This avoids that.
+  DenseMap<BinaryBasicBlock *, BinaryBasicBlock *> nextBasicBlock(
+      Layout.block_size());
+  for (size_t i = 0; i + 1 < Layout.block_size(); i++) {
+    auto current = Layout.block_begin() + i;
+    auto next = Layout.block_begin() + i + 1;
+    if (next != Layout.getFragment((*current)->getFragmentNum()).end())
+      nextBasicBlock.insert(std::pair(*current, *next));
+  }
+
   for (BinaryBasicBlock *BB : BasicBlocks) {
     const MCSymbol *TBB = nullptr;
     const MCSymbol *FBB = nullptr;
@@ -3605,7 +3617,7 @@ void BinaryFunction::fixBranches() {
 
     // Basic block that follows the current one in the final layout.
     const BinaryBasicBlock *const NextBB =
-        Layout.getBasicBlockAfter(BB, /*IgnoreSplits=*/false);
+        nextBasicBlock.lookup_or(BB, nullptr);
 
     if (BB->succ_size() == 1) {
       // __builtin_unreachable() could create a conditional branch that

paschalis-mpeis

Hey @Mark-Simulacrum,

Thanks for your patch. Tests appear to hang, however, your current version includes 69ccc39 (now reverted) which caused some issues.

Could you rebase to latest main so we can verify that?

This improves BOLT runtime when optimizing rustc_driver.so from 15 minutes to 7 minutes (49 minutes to 37 minutes of userspace time).

Mark-Simulacrum · 2025-09-05T22:57:56Z

Rebased. Can you clarify how you're running tests, so I can iterate locally as well?

paschalis-mpeis

Hey Mark,

Thanks for rebasing. I can now verify that tests complete.

There are a few more users of getBasicBlockAfter in loops (the call you cached) that might also benefit from this.
I am not sure if it's worth wrapping this in a helper that returns the cache, possibly parameterized by IgnoreSplits as well. Then others can re-use.

You can use the ninja check-bolt target to run the unit / lit tests.

bolt/lib/Core/BinaryFunction.cpp

This enables future re-use in other code that calls getBasicBlockAfter in loops, though for now those uses aren't introduced.

Mark-Simulacrum · 2025-09-21T16:35:49Z

I split the cache creation out into a dedicated function. I think in at least the librustc_driver.so case, none of the other usages show up as hot, so it's potentially not worth building the cache upfront for them.

maksfb · 2025-09-22T21:41:10Z

Thanks for the patch. The time it takes to run BOLT sounds excessive, even 7 minutes is a lot. Could you share BOLT log with --time-rewrite --time-opts?

Kobzol · 2025-09-23T08:34:00Z

I ran it on Rust's CI.

This is the log for LLVM:

LLVM

2025-09-23T07:36:44.0535321Z [2025-09-23T07:36:44.052Z INFO  opt_dist::exec] Executing `/rustroot/bin/llvm-bolt /tmp/.tmpZ4BJv1 -data /tmp/tmp-multistage/opt-artifacts/LLVM-bolt.profdata -o /checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/libLLVM.so.21.1-rust-1.92.0-nightly -reorder-blocks=ext-tsp -reorder-functions=cdsort -split-functions -split-strategy=cdsplit -split-all-cold -jump-tables=move -icf=all -update-debug-sections -dyno-stats --time-rewrite --time-opts [at /checkout/obj]`
2025-09-23T07:36:44.0567506Z BOLT-INFO: shared object or position-independent executable detected
2025-09-23T07:36:44.0571599Z BOLT-INFO: Target architecture: x86_64
2025-09-23T07:36:44.0572022Z BOLT-INFO: BOLT version: <unknown>
2025-09-23T07:36:44.0572636Z BOLT-INFO: first alloc address is 0x0
2025-09-23T07:36:44.0573216Z BOLT-INFO: creating new program header table at address 0x7c00000, offset 0x7c00000
2025-09-23T07:36:44.0573772Z BOLT-INFO: enabling relocation mode
2025-09-23T07:36:44.4354680Z BOLT-INFO: enabling lite mode
2025-09-23T07:36:44.8656995Z BOLT-WARNING: split function detected on input : d_type.cold. The support is limited in relocation mode
2025-09-23T07:36:47.8034711Z BOLT-WARNING: Failed to analyze 1171 relocations
2025-09-23T07:36:47.8390721Z BOLT-INFO: pre-processing profile using branch profile reader
2025-09-23T07:36:58.2078225Z BOLT-WARNING: 1 collisions detected while hashing binary objects. Use -v=1 to see the list.
2025-09-23T07:36:59.5066972Z BOLT-INFO: 14891 out of 127004 functions in the binary (11.7%) have non-empty execution profile
2025-09-23T07:36:59.5067725Z BOLT-INFO: 240 functions with profile could not be optimized
2025-09-23T07:36:59.5068200Z BOLT-INFO: profile for 1 objects was ignored
2025-09-23T07:37:00.2101723Z BOLT-INFO: profile quality metrics for the hottest 1000 functions (reporting top 5% values): function CFG discontinuity 0.00%; call graph flow conservation gap 0.00%; CFG flow conservation gap 0.00% (weighted) 0.00% (worst); exception handling usage 0.00% (of total BBEC) 0.00% (of total InvokeEC)
2025-09-23T07:37:00.8293612Z BOLT-INFO: validate-mem-refs updated 1 object references
2025-09-23T07:37:00.8687575Z BOLT-INFO: 593325 instructions were shortened
2025-09-23T07:37:00.9457522Z BOLT-INFO: removed 1712 empty blocks
2025-09-23T07:37:01.4707672Z BOLT-INFO: ICF folded 1673 out of 127312 functions in 4 passes. 12 functions had jump tables.
2025-09-23T07:37:01.4708686Z BOLT-INFO: Removing all identical functions will save 292.82 KB of code space. Folded functions were called 2701464704 times based on profile.
2025-09-23T07:37:02.7062384Z BOLT-INFO: basic block reordering modified layout of 7814 functions (52.47% of profiled, 6.22% of total)
2025-09-23T07:37:02.7441908Z BOLT-INFO: UCE removed 4 blocks and 166 bytes of code
2025-09-23T07:37:03.4401755Z BOLT-INFO: splitting separates 10641700 hot bytes from 8501746 cold bytes (55.59% of split functions is hot).
2025-09-23T07:37:03.4585762Z BOLT-INFO: 164 Functions were reordered by LoopInversionPass
2025-09-23T07:38:02.2842104Z BOLT-INFO: splitting separates 5422296 hot bytes from 8471894 cold bytes (39.03% of split functions is hot).
2025-09-23T07:38:02.6756469Z BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP:
2025-09-23T07:38:02.6756924Z 
2025-09-23T07:38:02.6757067Z         235104516805 : executed forward branches
2025-09-23T07:38:02.6757524Z          33013952055 : taken forward branches
2025-09-23T07:38:02.6757926Z          65383434196 : executed backward branches
2025-09-23T07:38:02.6758337Z          39036323064 : taken backward branches
2025-09-23T07:38:02.6758724Z          14888197913 : executed unconditional branches
2025-09-23T07:38:02.6759109Z          19895859861 : all function calls
2025-09-23T07:38:02.6759459Z           5214498636 : indirect calls
2025-09-23T07:38:02.6759787Z           3961463863 : PLT calls
2025-09-23T07:38:02.6760121Z        1784588195112 : executed instructions
2025-09-23T07:38:02.6760504Z         428273852366 : executed load instructions
2025-09-23T07:38:02.6760889Z         189118609977 : executed store instructions
2025-09-23T07:38:02.6761265Z           2678563147 : taken jump table branches
2025-09-23T07:38:02.6761651Z                    0 : taken unknown indirect branches
2025-09-23T07:38:02.6762013Z         315376148914 : total branches
2025-09-23T07:38:02.6762340Z          86938473032 : taken branches
2025-09-23T07:38:02.6762706Z         228437675882 : non-taken conditional branches
2025-09-23T07:38:02.6763104Z          72050275119 : taken conditional branches
2025-09-23T07:38:02.6763479Z         300487951001 : all conditional branches
2025-09-23T07:38:02.6763717Z 
2025-09-23T07:38:02.6763896Z         210977195564 : executed forward branches (-10.3%)
2025-09-23T07:38:02.6764493Z          17127806877 : taken forward branches (-48.1%)
2025-09-23T07:38:02.6764922Z          89510755437 : executed backward branches (+36.9%)
2025-09-23T07:38:02.6765354Z          40550966508 : taken backward branches (+3.9%)
2025-09-23T07:38:02.6766052Z           9850607756 : executed unconditional branches (-33.8%)
2025-09-23T07:38:02.6766480Z          19895859861 : all function calls (=)
2025-09-23T07:38:02.6766830Z           5214498636 : indirect calls (=)
2025-09-23T07:38:02.6767176Z           3961463863 : PLT calls (=)
2025-09-23T07:38:02.6767539Z        1771510302677 : executed instructions (-0.7%)
2025-09-23T07:38:02.6767949Z         428273852366 : executed load instructions (=)
2025-09-23T07:38:02.6768429Z         189118609977 : executed store instructions (=)
2025-09-23T07:38:02.6768834Z           2678563147 : taken jump table branches (=)
2025-09-23T07:38:02.6769234Z                    0 : taken unknown indirect branches (=)
2025-09-23T07:38:02.6769619Z         310338558757 : total branches (-1.6%)
2025-09-23T07:38:02.6769986Z          67529381141 : taken branches (-22.3%)
2025-09-23T07:38:02.6770399Z         242809177616 : non-taken conditional branches (+6.3%)
2025-09-23T07:38:02.6770904Z          57678773385 : taken conditional branches (-19.9%)
2025-09-23T07:38:02.6771317Z         300487951001 : all conditional branches (=)
2025-09-23T07:38:02.6771583Z 
2025-09-23T07:38:02.8707041Z BOLT-INFO: SCTC: patched 117 tail calls (113 forward) tail calls (4 backward) from a total of 117 while removing 5 double jumps and removing 120 basic blocks totalling 594 bytes of code. CTCs total execution count is 9486579 and the number of times CTCs are taken is 5413745
2025-09-23T07:38:09.1294010Z BOLT-INFO: setting __hot_start to 0x7e00000
2025-09-23T07:38:09.1294477Z BOLT-INFO: setting __hot_end to 0x8b1a287
2025-09-23T07:38:10.7843550Z ===-------------------------------------------------------------------------===
2025-09-23T07:38:10.7844184Z                                  Rewrite passes
2025-09-23T07:38:10.7844663Z ===-------------------------------------------------------------------------===
2025-09-23T07:38:10.7845839Z   Total Execution Time: 1241.3852 seconds (79.3465 wall clock)
2025-09-23T07:38:10.7846220Z 
2025-09-23T07:38:10.7846499Z    ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
2025-09-23T07:38:10.7847151Z   1192.1434 ( 98.8%)  34.0971 ( 97.0%)  1226.2405 ( 98.8%)  64.2015 ( 80.9%)  run optimization passes
2025-09-23T07:38:10.7847759Z    4.5596 (  0.4%)   0.4438 (  1.3%)   5.0035 (  0.4%)   5.0036 (  6.3%)  disassemble functions
2025-09-23T07:38:10.7848284Z    4.0559 (  0.3%)   0.2574 (  0.7%)   4.3133 (  0.3%)   4.3133 (  5.4%)  emit and link
2025-09-23T07:38:10.7848814Z    3.2442 (  0.3%)   0.1594 (  0.5%)   3.4036 (  0.3%)   3.4037 (  4.3%)  discover file objects
2025-09-23T07:38:10.7849368Z    1.4483 (  0.1%)   0.0560 (  0.2%)   1.5043 (  0.1%)   1.5043 (  1.9%)  pre-process profile data
2025-09-23T07:38:10.7849923Z    0.4897 (  0.0%)   0.0000 (  0.0%)   0.4897 (  0.0%)   0.4897 (  0.6%)  process profile data
2025-09-23T07:38:10.7850456Z    0.2412 (  0.0%)   0.1367 (  0.4%)   0.3779 (  0.0%)   0.3779 (  0.5%)  read special sections
2025-09-23T07:38:10.7851005Z    0.0261 (  0.0%)   0.0000 (  0.0%)   0.0261 (  0.0%)   0.0261 (  0.0%)  read debug info
2025-09-23T07:38:10.7851548Z    0.0130 (  0.0%)   0.0001 (  0.0%)   0.0131 (  0.0%)   0.0131 (  0.0%)  process metadata pre-CFG
2025-09-23T07:38:10.7852116Z    0.0130 (  0.0%)   0.0001 (  0.0%)   0.0131 (  0.0%)   0.0131 (  0.0%)  process profile data pre-CFG
2025-09-23T07:38:10.7852699Z    0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  update metadata post-emit
2025-09-23T07:38:10.7853235Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  discover storage
2025-09-23T07:38:10.7853773Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  process section metadata
2025-09-23T07:38:10.7854510Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  process metadata post-CFG
2025-09-23T07:38:10.7855085Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  finalize metadata pre-emit
2025-09-23T07:38:10.7855637Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  update debug info
2025-09-23T07:38:10.7856168Z   1206.2346 (100.0%)  35.1506 (100.0%)  1241.3852 (100.0%)  79.3465 (100.0%)  Total
2025-09-23T07:38:10.7856509Z 
2025-09-23T07:38:10.7856711Z ===-------------------------------------------------------------------------===
2025-09-23T07:38:10.7857160Z                           Binary Function Pass Manager
2025-09-23T07:38:10.7857590Z ===-------------------------------------------------------------------------===
2025-09-23T07:38:10.7858152Z   Total Execution Time: 1226.2137 seconds (64.1746 wall clock)
2025-09-23T07:38:10.7858474Z 
2025-09-23T07:38:10.7858750Z    ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
2025-09-23T07:38:10.7859365Z   1162.7142 ( 97.5%)  32.0898 ( 94.1%)  1194.8040 ( 97.4%)  57.2111 ( 89.1%)  split-functions
2025-09-23T07:38:10.7859918Z    1.7854 (  0.1%)   0.0000 (  0.0%)   1.7854 (  0.1%)   1.7855 (  2.8%)  reorder-functions
2025-09-23T07:38:10.7860487Z   12.6568 (  1.1%)   0.0000 (  0.0%)  12.6568 (  1.0%)   0.8433 (  1.3%)  reorder-blocks
2025-09-23T07:38:10.7861025Z    3.5497 (  0.3%)   1.5576 (  4.6%)   5.1073 (  0.4%)   0.8314 (  1.3%)  identical-code-folding
2025-09-23T07:38:10.7861644Z    0.7751 (  0.1%)   0.0000 (  0.0%)   0.7751 (  0.1%)   0.7751 (  1.2%)  profile-quality-stats
2025-09-23T07:38:10.7862170Z    0.5383 (  0.0%)   0.0000 (  0.0%)   0.5383 (  0.0%)   0.5383 (  0.8%)  fix-branches
2025-09-23T07:38:10.7862731Z    0.3774 (  0.0%)   0.0000 (  0.0%)   0.3774 (  0.0%)   0.3774 (  0.6%)  print dyno-stats after optimizations
2025-09-23T07:38:10.7863297Z    0.3492 (  0.0%)   0.0000 (  0.0%)   0.3492 (  0.0%)   0.3492 (  0.5%)  validate-mem-refs
2025-09-23T07:38:10.7863865Z    0.3344 (  0.0%)   0.0000 (  0.0%)   0.3344 (  0.0%)   0.3344 (  0.5%)  set dyno-stats before optimizations
2025-09-23T07:38:10.7864475Z    0.1949 (  0.0%)   0.0000 (  0.0%)   0.1949 (  0.0%)   0.1949 (  0.3%)  simplify-conditional-tail-calls
2025-09-23T07:38:10.7865062Z    0.1915 (  0.0%)   0.0000 (  0.0%)   0.1915 (  0.0%)   0.1915 (  0.3%)  validate-internal-calls
2025-09-23T07:38:10.7865579Z    5.1667 (  0.4%)   0.0000 (  0.0%)   5.1667 (  0.4%)   0.1470 (  0.2%)  aligner
2025-09-23T07:38:10.7866074Z    0.1213 (  0.0%)   0.0000 (  0.0%)   0.1213 (  0.0%)   0.1213 (  0.2%)  inst-lowering
2025-09-23T07:38:10.7866578Z    0.0843 (  0.0%)   0.0000 (  0.0%)   0.0843 (  0.0%)   0.0842 (  0.1%)  strip-rep-ret
2025-09-23T07:38:10.7867092Z    0.0822 (  0.0%)   0.0000 (  0.0%)   0.0822 (  0.0%)   0.0822 (  0.1%)  lower-annotations
2025-09-23T07:38:10.7867604Z    0.4988 (  0.0%)   0.4240 (  1.2%)   0.9228 (  0.1%)   0.0427 (  0.1%)  normalize CFG
2025-09-23T07:38:10.7868114Z    0.7246 (  0.1%)   0.0082 (  0.0%)   0.7328 (  0.1%)   0.0383 (  0.1%)  finalize-functions
2025-09-23T07:38:10.7868648Z    0.8009 (  0.1%)   0.0000 (  0.0%)   0.8009 (  0.1%)   0.0381 (  0.1%)  eliminate-unreachable
2025-09-23T07:38:10.7869194Z    0.5729 (  0.0%)   0.0000 (  0.0%)   0.5729 (  0.0%)   0.0365 (  0.1%)  shorten-instructions
2025-09-23T07:38:10.7869719Z    0.0351 (  0.0%)   0.0000 (  0.0%)   0.0351 (  0.0%)   0.0351 (  0.1%)  clean-mc-state
2025-09-23T07:38:10.7870222Z    0.3754 (  0.0%)   0.0000 (  0.0%)   0.3754 (  0.0%)   0.0343 (  0.1%)  remove-nops
2025-09-23T07:38:10.7870727Z    0.0218 (  0.0%)   0.0001 (  0.0%)   0.0219 (  0.0%)   0.0219 (  0.0%)  assign-sections
2025-09-23T07:38:10.7871247Z    0.1237 (  0.0%)   0.0170 (  0.0%)   0.1407 (  0.0%)   0.0185 (  0.0%)  loop-inversion-opt
2025-09-23T07:38:10.7871785Z    0.0146 (  0.0%)   0.0000 (  0.0%)   0.0146 (  0.0%)   0.0145 (  0.0%)  estimate-edge-counts
2025-09-23T07:38:10.7872301Z    0.0145 (  0.0%)   0.0000 (  0.0%)   0.0145 (  0.0%)   0.0145 (  0.0%)  print-stats
2025-09-23T07:38:10.7872835Z    0.0135 (  0.0%)   0.0000 (  0.0%)   0.0135 (  0.0%)   0.0135 (  0.0%)  patch-entries
2025-09-23T07:38:10.7873360Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  retpoline-insertion
2025-09-23T07:38:10.7873869Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  inlining
2025-09-23T07:38:10.7874358Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  reorder-data
2025-09-23T07:38:10.7874883Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  PLT call optimization
2025-09-23T07:38:10.7875415Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  tail duplication
2025-09-23T07:38:10.7875972Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  frame-optimizer
2025-09-23T07:38:10.7876478Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  peepholes
2025-09-23T07:38:10.7876979Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  alloc-combiner
2025-09-23T07:38:10.7877616Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  indirect-call-promotion
2025-09-23T07:38:10.7878209Z   1192.1171 (100.0%)  34.0967 (100.0%)  1226.2137 (100.0%)  64.1746 (100.0%)  Total
2025-09-23T07:38:10.7878551Z 
2025-09-23T07:38:10.7878737Z ===-------------------------------------------------------------------------===
2025-09-23T07:38:10.7879171Z                                   CG breakdown
2025-09-23T07:38:10.7879629Z ===-------------------------------------------------------------------------===
2025-09-23T07:38:10.7880128Z   Total Execution Time: 0.8442 seconds (0.8442 wall clock)
2025-09-23T07:38:10.7880432Z 
2025-09-23T07:38:10.7880643Z    ---User Time---   --User+System--   ---Wall Time---  --- Name ---
2025-09-23T07:38:10.7881157Z    0.8442 (100.0%)   0.8442 (100.0%)   0.8442 (100.0%)  Callgraph construction
2025-09-23T07:38:10.7881627Z    0.8442 (100.0%)   0.8442 (100.0%)   0.8442 (100.0%)  Total
2025-09-23T07:38:10.7881902Z

And here for the Rust compiler's shared library:

rustc

2025-09-23T08:08:24.1087661Z [2025-09-23T08:08:24.108Z INFO  opt_dist::exec] Executing `/rustroot/bin/llvm-bolt /tmp/.tmp7rsDA1 -data /tmp/tmp-multistage/opt-artifacts/rustc-bolt.profdata -o /checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/librustc_driver-37c25f9240306b8c.so -reorder-blocks=ext-tsp -reorder-functions=cdsort -split-functions -split-strategy=cdsplit -split-all-cold -jump-tables=move -icf=all -update-debug-sections -dyno-stats --time-rewrite --time-opts [at /checkout/obj]`
2025-09-23T08:08:24.1162714Z BOLT-INFO: shared object or position-independent executable detected
2025-09-23T08:08:24.1167794Z BOLT-INFO: Target architecture: x86_64
2025-09-23T08:08:24.1168182Z BOLT-INFO: BOLT version: <unknown>
2025-09-23T08:08:24.1168539Z BOLT-INFO: first alloc address is 0x0
2025-09-23T08:08:24.1169073Z BOLT-INFO: creating new program header table at address 0x5000000, offset 0x5000000
2025-09-23T08:08:24.1169599Z BOLT-INFO: enabling relocation mode
2025-09-23T08:08:24.3930106Z BOLT-INFO: enabling lite mode
2025-09-23T08:08:25.0832384Z BOLT-WARNING: split function detected on input : d_type.cold. The support is limited in relocation mode
2025-09-23T08:08:27.2217568Z BOLT-WARNING: Failed to analyze 216 relocations
2025-09-23T08:08:27.2420728Z BOLT-INFO: pre-processing profile using branch profile reader
2025-09-23T08:08:39.2601621Z BOLT-WARNING: 10 collisions detected while hashing binary objects. Use -v=1 to see the list.
2025-09-23T08:08:40.7672287Z BOLT-INFO: 14020 out of 73549 functions in the binary (19.1%) have non-empty execution profile
2025-09-23T08:08:40.7673057Z BOLT-INFO: 496 functions with profile could not be optimized
2025-09-23T08:08:40.7673519Z BOLT-INFO: profile for 1 objects was ignored
2025-09-23T08:08:41.5617188Z BOLT-INFO: profile quality metrics for the hottest 1000 functions (reporting top 5% values): function CFG discontinuity 0.00%; call graph flow conservation gap 0.00%; CFG flow conservation gap 0.00% (weighted) 0.00% (worst); exception handling usage 0.00% (of total BBEC) 0.00% (of total InvokeEC)
2025-09-23T08:08:42.4075121Z BOLT-INFO: 830299 instructions were shortened
2025-09-23T08:08:42.4612088Z BOLT-INFO: removed 1400 empty blocks
2025-09-23T08:08:42.4612532Z BOLT-INFO: merged 3 duplicate CFG edges
2025-09-23T08:08:43.1085506Z BOLT-INFO: ICF folded 71 out of 73966 functions in 3 passes. 13 functions had jump tables.
2025-09-23T08:08:43.1086714Z BOLT-INFO: Removing all identical functions will save 33.68 KB of code space. Folded functions were called 83349861 times based on profile.
2025-09-23T08:08:44.2241731Z BOLT-INFO: basic block reordering modified layout of 8895 functions (63.45% of profiled, 12.04% of total)
2025-09-23T08:08:45.6003056Z BOLT-INFO: splitting separates 19348618 hot bytes from 9477858 cold bytes (67.12% of split functions is hot).
2025-09-23T08:08:45.6137762Z BOLT-INFO: 118 Functions were reordered by LoopInversionPass
2025-09-23T08:17:09.2383905Z BOLT-INFO: splitting separates 12219489 hot bytes from 7571490 cold bytes (61.74% of split functions is hot).
2025-09-23T08:17:09.6368551Z BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP:
2025-09-23T08:17:09.6370124Z 
2025-09-23T08:17:09.6370507Z         159303000032 : executed forward branches
2025-09-23T08:17:09.6370968Z          13512704701 : taken forward branches
2025-09-23T08:17:09.6371346Z          22109772505 : executed backward branches
2025-09-23T08:17:09.6371727Z          15474886893 : taken backward branches
2025-09-23T08:17:09.6372118Z           7312349150 : executed unconditional branches
2025-09-23T08:17:09.6372699Z          10414141994 : all function calls
2025-09-23T08:17:09.6373049Z           5212041960 : indirect calls
2025-09-23T08:17:09.6373384Z            157126340 : PLT calls
2025-09-23T08:17:09.6373718Z        1299789320113 : executed instructions
2025-09-23T08:17:09.6374097Z         327477301380 : executed load instructions
2025-09-23T08:17:09.6374499Z         183484733372 : executed store instructions
2025-09-23T08:17:09.6374959Z           3252587661 : taken jump table branches
2025-09-23T08:17:09.6375346Z                    0 : taken unknown indirect branches
2025-09-23T08:17:09.6375717Z         188725121687 : total branches
2025-09-23T08:17:09.6376050Z          36299940744 : taken branches
2025-09-23T08:17:09.6376498Z         152425180943 : non-taken conditional branches
2025-09-23T08:17:09.6376899Z          28987591594 : taken conditional branches
2025-09-23T08:17:09.6377281Z         181412772537 : all conditional branches
2025-09-23T08:17:09.6377523Z 
2025-09-23T08:17:09.6377706Z         150062838844 : executed forward branches (-5.8%)
2025-09-23T08:17:09.6378130Z           7662406296 : taken forward branches (-43.3%)
2025-09-23T08:17:09.6378553Z          31348251587 : executed backward branches (+41.8%)
2025-09-23T08:17:09.6378981Z          14962033590 : taken backward branches (-3.3%)
2025-09-23T08:17:09.6379428Z           6073487992 : executed unconditional branches (-16.9%)
2025-09-23T08:17:09.6379854Z          10414141994 : all function calls (=)
2025-09-23T08:17:09.6380214Z           5212041960 : indirect calls (=)
2025-09-23T08:17:09.6380562Z            157126340 : PLT calls (=)
2025-09-23T08:17:09.6380925Z        1293805780658 : executed instructions (-0.5%)
2025-09-23T08:17:09.6381339Z         327477301380 : executed load instructions (=)
2025-09-23T08:17:09.6381748Z         183484733372 : executed store instructions (=)
2025-09-23T08:17:09.6382150Z           3252587661 : taken jump table branches (=)
2025-09-23T08:17:09.6382549Z                    0 : taken unknown indirect branches (=)
2025-09-23T08:17:09.6382926Z         187484578423 : total branches (-0.7%)
2025-09-23T08:17:09.6383286Z          28697927878 : taken branches (-20.9%)
2025-09-23T08:17:09.6383697Z         158786650545 : non-taken conditional branches (+4.2%)
2025-09-23T08:17:09.6384153Z          22624439886 : taken conditional branches (-22.0%)
2025-09-23T08:17:09.6384584Z         181411090431 : all conditional branches (-0.0%)
2025-09-23T08:17:09.6384859Z 
2025-09-23T08:17:09.8918378Z BOLT-INFO: SCTC: patched 33 tail calls (33 forward) tail calls (0 backward) from a total of 33 while removing 0 double jumps and removing 33 basic blocks totalling 165 bytes of code. CTCs total execution count is 1454562 and the number of times CTCs are taken is 1450251
2025-09-23T08:17:18.0775912Z BOLT-INFO: setting _end to 0x7ad9420
2025-09-23T08:17:18.0928802Z BOLT-INFO: setting __hot_start to 0x5200000
2025-09-23T08:17:18.0929438Z BOLT-INFO: setting __hot_end to 0x69c5e9b
2025-09-23T08:17:19.7001113Z ===-------------------------------------------------------------------------===
2025-09-23T08:17:19.7001661Z                                  Rewrite passes
2025-09-23T08:17:19.7002090Z ===-------------------------------------------------------------------------===
2025-09-23T08:17:19.7002612Z   Total Execution Time: 4849.9231 seconds (527.8837 wall clock)
2025-09-23T08:17:19.7003190Z 
2025-09-23T08:17:19.7003481Z    ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
2025-09-23T08:17:19.7004150Z   2411.0373 ( 99.3%)  2421.3456 (100.0%)  4832.3830 ( 99.6%)  510.3416 ( 96.7%)  run optimization passes
2025-09-23T08:17:19.7004733Z    6.2426 (  0.3%)   0.2655 (  0.0%)   6.5081 (  0.1%)   6.5083 (  1.2%)  emit and link
2025-09-23T08:17:19.7005265Z    5.2722 (  0.2%)   0.5675 (  0.0%)   5.8397 (  0.1%)   5.8411 (  1.1%)  disassemble functions
2025-09-23T08:17:19.7006101Z    2.6959 (  0.1%)   0.1529 (  0.0%)   2.8488 (  0.1%)   2.8489 (  0.5%)  discover file objects
2025-09-23T08:17:19.7006765Z    1.4865 (  0.1%)   0.0403 (  0.0%)   1.5268 (  0.0%)   1.5269 (  0.3%)  pre-process profile data
2025-09-23T08:17:19.7007318Z    0.5127 (  0.0%)   0.0000 (  0.0%)   0.5127 (  0.0%)   0.5127 (  0.1%)  process profile data
2025-09-23T08:17:19.7007852Z    0.1676 (  0.0%)   0.1084 (  0.0%)   0.2759 (  0.0%)   0.2760 (  0.1%)  read special sections
2025-09-23T08:17:19.7008378Z    0.0114 (  0.0%)   0.0000 (  0.0%)   0.0114 (  0.0%)   0.0114 (  0.0%)  read debug info
2025-09-23T08:17:19.7008986Z    0.0084 (  0.0%)   0.0000 (  0.0%)   0.0084 (  0.0%)   0.0084 (  0.0%)  process metadata pre-CFG
2025-09-23T08:17:19.7009552Z    0.0084 (  0.0%)   0.0000 (  0.0%)   0.0084 (  0.0%)   0.0084 (  0.0%)  process profile data pre-CFG
2025-09-23T08:17:19.7010162Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  discover storage
2025-09-23T08:17:19.7010696Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  process section metadata
2025-09-23T08:17:19.7011256Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  update metadata post-emit
2025-09-23T08:17:19.7011824Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  process metadata post-CFG
2025-09-23T08:17:19.7012387Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  finalize metadata pre-emit
2025-09-23T08:17:19.7012931Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  update debug info
2025-09-23T08:17:19.7013477Z   2427.4429 (100.0%)  2422.4802 (100.0%)  4849.9231 (100.0%)  527.8837 (100.0%)  Total
2025-09-23T08:17:19.7013821Z 
2025-09-23T08:17:19.7014007Z ===-------------------------------------------------------------------------===
2025-09-23T08:17:19.7014456Z                           Binary Function Pass Manager
2025-09-23T08:17:19.7014889Z ===-------------------------------------------------------------------------===
2025-09-23T08:17:19.7015393Z   Total Execution Time: 4832.3549 seconds (510.3134 wall clock)
2025-09-23T08:17:19.7015715Z 
2025-09-23T08:17:19.7015979Z    ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
2025-09-23T08:17:19.7016604Z   2372.4561 ( 98.4%)  2411.8820 ( 99.6%)  4784.3382 ( 99.0%)  502.1853 ( 98.4%)  split-functions
2025-09-23T08:17:19.7017165Z    1.9135 (  0.1%)   0.0000 (  0.0%)   1.9135 (  0.0%)   1.9135 (  0.4%)  reorder-functions
2025-09-23T08:17:19.7017703Z    4.9072 (  0.2%)   6.8726 (  0.3%)  11.7798 (  0.2%)   1.0785 (  0.2%)  identical-code-folding
2025-09-23T08:17:19.7018235Z    0.8752 (  0.0%)   0.0000 (  0.0%)   0.8752 (  0.0%)   0.8752 (  0.2%)  fix-branches
2025-09-23T08:17:19.7018760Z    0.8417 (  0.0%)   0.0000 (  0.0%)   0.8417 (  0.0%)   0.8445 (  0.2%)  profile-quality-stats
2025-09-23T08:17:19.7019295Z   10.8591 (  0.5%)   0.0000 (  0.0%)  10.8591 (  0.2%)   0.5774 (  0.1%)  reorder-blocks
2025-09-23T08:17:19.7019816Z    0.4994 (  0.0%)   0.0000 (  0.0%)   0.4994 (  0.0%)   0.4994 (  0.1%)  validate-mem-refs
2025-09-23T08:17:19.7020386Z    0.3845 (  0.0%)   0.0000 (  0.0%)   0.3845 (  0.0%)   0.3845 (  0.1%)  print dyno-stats after optimizations
2025-09-23T08:17:19.7020953Z   10.1393 (  0.4%)   2.5906 (  0.1%)  12.7299 (  0.3%)   0.3762 (  0.1%)  finalize-functions
2025-09-23T08:17:19.7021519Z    0.3394 (  0.0%)   0.0000 (  0.0%)   0.3394 (  0.0%)   0.3394 (  0.1%)  set dyno-stats before optimizations
2025-09-23T08:17:19.7022194Z    0.2545 (  0.0%)   0.0000 (  0.0%)   0.2545 (  0.0%)   0.2545 (  0.0%)  simplify-conditional-tail-calls
2025-09-23T08:17:19.7022784Z    0.2519 (  0.0%)   0.0000 (  0.0%)   0.2519 (  0.0%)   0.2519 (  0.0%)  validate-internal-calls
2025-09-23T08:17:19.7023321Z    0.1601 (  0.0%)   0.0000 (  0.0%)   0.1601 (  0.0%)   0.1601 (  0.0%)  inst-lowering
2025-09-23T08:17:19.7023844Z    0.1249 (  0.0%)   0.0000 (  0.0%)   0.1249 (  0.0%)   0.1249 (  0.0%)  lower-annotations
2025-09-23T08:17:19.7024353Z    3.9011 (  0.2%)   0.0000 (  0.0%)   3.9011 (  0.1%)   0.1101 (  0.0%)  aligner
2025-09-23T08:17:19.7024889Z    0.1056 (  0.0%)   0.0000 (  0.0%)   0.1056 (  0.0%)   0.1056 (  0.0%)  strip-rep-ret
2025-09-23T08:17:19.7025398Z    0.0489 (  0.0%)   0.0000 (  0.0%)   0.0489 (  0.0%)   0.0489 (  0.0%)  clean-mc-state
2025-09-23T08:17:19.7025945Z    0.9365 (  0.0%)   0.0000 (  0.0%)   0.9365 (  0.0%)   0.0400 (  0.0%)  eliminate-unreachable
2025-09-23T08:17:19.7026494Z    0.7548 (  0.0%)   0.0000 (  0.0%)   0.7548 (  0.0%)   0.0363 (  0.0%)  shorten-instructions
2025-09-23T08:17:19.7027053Z    0.5528 (  0.0%)   0.0000 (  0.0%)   0.5528 (  0.0%)   0.0294 (  0.0%)  normalize CFG
2025-09-23T08:17:19.7027558Z    0.4854 (  0.0%)   0.0000 (  0.0%)   0.4854 (  0.0%)   0.0243 (  0.0%)  remove-nops
2025-09-23T08:17:19.7028068Z    0.0154 (  0.0%)   0.0000 (  0.0%)   0.0154 (  0.0%)   0.0154 (  0.0%)  assign-sections
2025-09-23T08:17:19.7028630Z    0.1778 (  0.0%)   0.0000 (  0.0%)   0.1778 (  0.0%)   0.0136 (  0.0%)  loop-inversion-opt
2025-09-23T08:17:19.7029144Z    0.0085 (  0.0%)   0.0000 (  0.0%)   0.0085 (  0.0%)   0.0085 (  0.0%)  print-stats
2025-09-23T08:17:19.7029669Z    0.0084 (  0.0%)   0.0000 (  0.0%)   0.0084 (  0.0%)   0.0084 (  0.0%)  estimate-edge-counts
2025-09-23T08:17:19.7030197Z    0.0077 (  0.0%)   0.0000 (  0.0%)   0.0077 (  0.0%)   0.0077 (  0.0%)  patch-entries
2025-09-23T08:17:19.7030725Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  retpoline-insertion
2025-09-23T08:17:19.7031283Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  indirect-call-promotion
2025-09-23T08:17:19.7031838Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  PLT call optimization
2025-09-23T08:17:19.7032355Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  inlining
2025-09-23T08:17:19.7032867Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  tail duplication
2025-09-23T08:17:19.7033378Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  peepholes
2025-09-23T08:17:19.7033877Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  reorder-data
2025-09-23T08:17:19.7034390Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  frame-optimizer
2025-09-23T08:17:19.7034909Z    0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  alloc-combiner
2025-09-23T08:17:19.7035446Z   2411.0097 (100.0%)  2421.3452 (100.0%)  4832.3549 (100.0%)  510.3134 (100.0%)  Total
2025-09-23T08:17:19.7035797Z 
2025-09-23T08:17:19.7035986Z ===-------------------------------------------------------------------------===
2025-09-23T08:17:19.7036419Z                                   CG breakdown
2025-09-23T08:17:19.7036838Z ===-------------------------------------------------------------------------===
2025-09-23T08:17:19.7037419Z   Total Execution Time: 1.1101 seconds (1.1101 wall clock)
2025-09-23T08:17:19.7037725Z 
2025-09-23T08:17:19.7037936Z    ---User Time---   --User+System--   ---Wall Time---  --- Name ---
2025-09-23T08:17:19.7038452Z    1.1101 (100.0%)   1.1101 (100.0%)   1.1101 (100.0%)  Callgraph construction
2025-09-23T08:17:19.7038921Z    1.1101 (100.0%)   1.1101 (100.0%)   1.1101 (100.0%)  Total
2025-09-23T08:17:19.7039198Z

maksfb · 2025-09-23T18:38:07Z

Thanks for sharing the logs. Most of the time is spent in split functions optimizations, due to --split-strategy=cdsplit being used.

While the CDSplit algorithm can produce the best layout, the improvement over the "regular" function split rarely exceeds 0.5%. Before we speedup CDSplit, it's a tradeoff between the build time and the application performance. Were you able to measure improvement by using CDSplit? If not, you can speed up BOLT processing times significantly by disabling it.

I also recommend using --icf=safe over --icf=all.

maksfb · 2025-09-23T19:39:34Z

@Mark-Simulacrum, thanks for identifying the bottleneck and the fix!

A slightly cleaner change would be to make fixBranches() iterate over basic blocks in Layout order using iterators, and add FunctionLayout::getBasicBlockAfter(FunctionLayout::block_iterator, bool). Additionally, we can also refactor the existing getBasicBlockAfter() to call the new interface. Do you want to take a stab at it? Otherwise I can work on it. Thanks!

Mark-Simulacrum · 2025-09-23T20:17:08Z

I'd prefer to leave it to you, not comfortable enough editing this code too significantly (both due to C++ and unfamiliarity with the surroundings). Happy to close this PR or have you merge it (I don't have permission to do so) and then build atop.

Kobzol · 2025-09-23T21:50:29Z

Thanks for sharing the logs. Most of the time is spent in split functions optimizations, due to --split-strategy=cdsplit being used.

While the CDSplit algorithm can produce the best layout, the improvement over the "regular" function split rarely exceeds 0.5%. Before we speedup CDSplit, it's a tradeoff between the build time and the application performance. Were you able to measure improvement by using CDSplit? If not, you can speed up BOLT processing times significantly by disabling it.

I also recommend using --icf=safe over --icf=all.

It was actually suggested by Amir 😆 (rust-lang/rust#119418). We saw relatively decent max RSS wins with CDsplit.

Iterator implementation of PR llvm#156243: This improves BOLT runtime when optimizing rustc_driver.so from 15 minutes to 7 minutes (or 49 minutes to 37 minutes of userspace time). Co-authored-by: Mark-Simulacrum <mark.simulacrum@gmail.com>

maksfb · 2025-09-23T22:14:49Z

It was actually suggested by Amir 😆 (rust-lang/rust#119418). We saw relatively decent max RSS wins with CDsplit.

Nice! How confident are you in RSS wins? This comes as a surprise and I'd like to know what happened once it's confirmed. The improvement in CPU cycles is expected though.

Kobzol · 2025-09-23T22:22:27Z

Very confident, it was across the board and we don't see these kinds of numbers out of the blue.

maksfb · 2025-09-23T22:43:38Z

It's possible that "warm" code that gets isolated by CDSplit is not touched by your benchmarks. Do you know if in terms of absolute numbers RSS wins are comparable to the size of .text.warm in the binary?

Iterator implementation of PR #156243: This improves BOLT runtime when optimizing rustc_driver.so from 15 minutes to 7 minutes (or 49 minutes to 37 minutes of userspace time). Co-authored-by: Mark-Simulacrum <mark.simulacrum@gmail.com>

Kobzol · 2025-09-25T09:04:47Z

The size of .text.warm back then was 841 250 B (~840 KiB). The absolute max RSS wins are shown here. They are in the range of ~2 MiB to 30 MiB.

maksfb · 2025-09-26T20:32:54Z

Thanks for checking.

30 MiB is hard to explain, unless it's a number gathered across multiple processes, e.g. in a case of a parallel build. If it's the number for a single process, it must be some side effect. I find it hard to believe it's a direct impact from a code layout optimization.

maksfb · 2025-09-26T20:35:44Z

Did you ever try these options: --use-old-text --no-huge-pages --align-text=64? They might help to reduce the binary size and hopefully RSS as well. If BOLT produces a warning that it cannot fit .text, you can try adding --align-functions=2 on top of that.

Kobzol · 2025-09-27T07:46:33Z

Well, hard to say then. Tbh the main issue with BOLT that we have is not RSS or even the BOLT optimization time, but the fact that it doubles the binary size of the optimized artifacts, due to -use-old-text not working properly. But that's another topic.

I suppose that this PR can be closed, now that the change has landed in #160407?

Kobzol · 2025-09-27T07:56:51Z

We haven't tried this exact combination, I'll try it, thanks. The issue we had with --use-old-text was that it was "unstable", i.e. in one commit the old text segment was reused, and in the next it wasn't, so the binary size was jumping all over the place, so we just disabled --use-old-text.

aaupov · 2025-09-27T08:15:56Z

I suppose that this PR can be closed, now that the change has landed in #160407?

Yes, the quadratic behavior in cdsplit is addressed by #160407

Mark-Simulacrum requested review from aaupov, maksfb, rafaelauler, ayermolo, yota9, paschalis-mpeis and yozhu as code owners August 31, 2025 13:14

llvmbot added the BOLT label Aug 31, 2025

paschalis-mpeis reviewed Sep 5, 2025

View reviewed changes

[BOLT] Optimize basic block loops to avoid n^2 loop

480d48e

This improves BOLT runtime when optimizing rustc_driver.so from 15 minutes to 7 minutes (49 minutes to 37 minutes of userspace time).

Mark-Simulacrum force-pushed the bolt-opt-next-bb branch from f833920 to 480d48e Compare September 5, 2025 22:54

paschalis-mpeis reviewed Sep 8, 2025

View reviewed changes

bolt/lib/Core/BinaryFunction.cpp Outdated Show resolved Hide resolved

[BOLT] Split getBasicBlocksAfter cache into a distinct function

db72cc2

This enables future re-use in other code that calls getBasicBlockAfter in loops, though for now those uses aren't introduced.

maksfb mentioned this pull request Sep 23, 2025

[BOLT] Avoid n^2 complexity in fixBranches(). NFCI #160407

Merged

aaupov closed this Sep 27, 2025

[BOLT] Optimize basic block loops to avoid n^2 loop #156243

[BOLT] Optimize basic block loops to avoid n^2 loop #156243

Uh oh!

Conversation

Mark-Simulacrum commented Aug 31, 2025

Uh oh!

github-actions bot commented Aug 31, 2025

Uh oh!

llvmbot commented Aug 31, 2025

Uh oh!

paschalis-mpeis left a comment

Choose a reason for hiding this comment

Uh oh!

Mark-Simulacrum commented Sep 5, 2025

Uh oh!

paschalis-mpeis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Mark-Simulacrum commented Sep 21, 2025

Uh oh!

maksfb commented Sep 22, 2025

Uh oh!

Kobzol commented Sep 23, 2025

Uh oh!

maksfb commented Sep 23, 2025

Uh oh!

maksfb commented Sep 23, 2025

Uh oh!

Mark-Simulacrum commented Sep 23, 2025

Uh oh!

Kobzol commented Sep 23, 2025

Uh oh!

maksfb commented Sep 23, 2025

Uh oh!

Kobzol commented Sep 23, 2025

Uh oh!

maksfb commented Sep 23, 2025

Uh oh!

Kobzol commented Sep 25, 2025

Uh oh!

maksfb commented Sep 26, 2025

Uh oh!

maksfb commented Sep 26, 2025

Uh oh!

Kobzol commented Sep 27, 2025

Uh oh!

Kobzol commented Sep 27, 2025

Uh oh!

aaupov commented Sep 27, 2025

Uh oh!

Uh oh!